Memory
Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions
As the economic and environmental costs of training and deploying large vision or language models increase dramatically, analog in-memory computing (AIMC) emerges as a promising energy-efficient solution. However, the training perspective, especially its training dynamics, is underexplored. In AIMC hardware, the trainable weights are represented by the conductance of resistive elements and updated using consecutive electrical pulses. While the conductance changes by a constant in response to each pulse, in reality, the change is scaled by asymmetric and non-linear response functions, leading to a non-ideal training dynamics. This paper provides a theoretical foundation for gradient-based training on AIMC hardware with nonideal response functions.
R-KV: Redundancy-aware KVCache Compression for Reasoning Models
Reasoning models have demonstrated impressive performance in self-reflection and chain-of-thought reasoning. However, they often produce excessively long outputs, leading to prohibitively large key-value (KV) caches during inference. While chain-of-thought inference significantly improves performance on complex reasoning tasks, it can also lead to reasoning failures when deployed with existing KV cache compression approaches. To address this, we propose Redundancyaware KVCache Compression for Reasoning models (R-KV), a novel method specifically targeting redundant tokens in reasoning models. Our method preserves nearly 100% of the full KV cache performance using only 10% of the KV cache, substantially outperforming existing KV cache baselines, which reaches only 60% of the performance. Remarkably, R-KV even achieves 105% of full KV cache performance with 16% of the KV cache. This KV-cache reduction also leads to a 90% memory saving and a 6.6 throughput over standard chain-ofthought reasoning inference. Experimental results show that R-KV consistently outperforms existing KV cache compression baselines across two mathematical reasoning datasets.
Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes
Recently, several works have shown that natural modifications of the classical conditional gradient method (aka Frank-Wolfe algorithm) for constrained convex optimization, provably converge with a linear rate when the feasible set is a polytope, and the objective is smooth and strongly-convex. However, all of these results suffer from two significant shortcomings: i) large memory requirement due to the need to store an explicit convex decomposition of the current iterate, and as a consequence, large running-time overhead per iteration ii) the worst case convergence rate depends unfavorably on the dimension In this work we present a new conditional gradient variant and a corresponding analysis that improves on both of the above shortcomings. In particular, both memory and computation overheads are only linear in the dimension, and in addition, in case the optimal solution is sparse, the new convergence rate replaces a factor which is at least linear in the dimension in previous works, with a linear dependence on the number of non-zeros in the optimal solution At the heart of our method, and corresponding analysis, is a novel way to compute decomposition-invariant away-steps. While our theoretical guarantees do not apply to any polytope, they apply to several important structured polytopes that capture central concepts such as paths in graphs, perfect matchings in bipartite graphs, marginal distributions that arise in structured prediction tasks, and more. Our theoretical findings are complemented by empirical evidence that shows that our method delivers state-of-the-art performance.
Amazon just unleashed its Cyber Monday laptop deals and it's dropping prices on MacBooks, gaming PCs, and more
Gear Computers Laptops Amazon just unleashed its Cyber Monday laptop deals and it's dropping prices on MacBooks, gaming PCs, and more Whether you need a basic everyday driver or a full-featured gaming PC, Amazon's Cyber Monday laptop can save you cash. We may earn revenue from the products available on this page and participate in affiliate programs. A laptop is a big investment. Not only do they typically cost a lot of money, but you're committing a machine you'll stare at while you shop, do homework, remote work, game, and pretty much everything else in your online life. Amazon just dropped its Cyber Monday deals on laptops and these are some of the lowest prices we have seen all year.
Black Friday 2025 could be your last chance for cheap PC deals, experts warn
When you purchase through links in our articles, we may earn a small commission. AI is causing a DRAM apocalypse and it's affecting the whole PC market this holiday season. This year, Black Friday tech shoppers should heed one important message: Don't wait, buy now. Because certain components are skyrocketing in price--and it's expected to get even worse. DRAM prices, for example, have doubled in little more than a month. AI hyperscalers have snapped up whatever they can buy.
Zero-Knowledge Proofs in Sublinear Space
Zero-knowledge proofs allow verification of computations without revealing private information. However, existing systems require memory proportional to the computation size, which has historically limited use in large-scale applications and on mobile and edge devices. We solve this fundamental bottleneck by developing, to our knowledge, the first proof system with sublinear memory requirements for mainstream cryptographic constructions. Our approach processes computations in blocks using a space-efficient tree algorithm, reducing memory from linear scaling to square-root scaling--from $ฮ(T)$ to $O(\sqrt{T} + \log T \log\log T)$ for computation size $T$--while maintaining the same proof generation time through a constant number of streaming passes. For widely-used linear polynomial commitment schemes (KZG/IPA), our method produces identical proofs and verification when using the same parameters and hashing only aggregate commitments into the challenge generation, preserving proof size and security. Hash-based systems also achieve square-root memory scaling though with slightly different proof structures. This advance enables zero-knowledge proofs on everyday devices and makes previously infeasible large computations verifiable, fundamentally democratizing access to privacy-preserving computation. Space-efficient zero knowledge proof systems create opportunities to reshape how trust is established in digital systems--from enabling widespread participation in decentralized networks to making verifiable scientific computing practical at unprecedented scales.
Memory-Efficient FastText: A Comprehensive Approach Using Double-Array Trie Structures and Mark-Compact Memory Management
FastText has established itself as a fundamental algorithm for learning word representations, demonstrating exceptional capability in handling out-of-vocabulary words through character-level n-gram embeddings. However, its hash-based bucketing mechanism introduces critical limitations for large-scale industrial deployment: hash collisions cause semantic drift, and memory requirements become prohibitively expensive when dealing with real-world vocabularies containing millions of terms. This paper presents a comprehensive memory optimization framework that fundamentally reimagines FastText's memory management through the integration of double-array trie (DA-trie) structures and mark-compact garbage collection principles. Our approach leverages the linguistic insight that n-grams sharing common prefixes or suffixes exhibit highly correlated embeddings due to co-occurrence patterns in natural language. By systematically identifying and merging semantically similar embeddings based on structural relationships, we achieve compression ratios of 4:1 to 10:1 while maintaining near-perfect embedding quality. The algorithm consists of four sophisticated phases: prefix trie construction with embedding mapping, prefix-based similarity compression, suffix-based similarity compression, and mark-compact memory reorganization. Comprehensive experiments on a 30-million Chinese vocabulary dataset demonstrate memory reduction from over 100GB to approximately 30GB with negligible performance degradation. Our industrial deployment results show significant cost reduction, faster loading times, and improved model reliability through the elimination of hash collision artifacts. Code and experimental implementations are available at: https://github.com/initial-d/me_fasttext
Bayesian Reasoning Enabled by Spin-Orbit Torque Magnetic Tunnel Junctions
Xu, Yingqian, Li, Xiaohan, Wan, Caihua, Zhang, Ran, He, Bin, Liu, Shiqiang, Xia, Jihao, Kong, Dehao, Xiong, Shilong, Yu, Guoqiang, Han, Xiufeng
The rapid development of artificial intelligence (AI) over the past few decades has been nourished by advancements in machine learning algorithms, increased computational power, and availability of vast amounts of data[1], which has in turn revolutionized numerous fields including but not limited to medical science and healthcare, information technologies, finance, transportation, and more. This regenerative feedback between AI and its applications leads to a further explosive growth of data and expansion of model scales, which calls for a paradigm shift toward efficient and speedy computing and memory technologies, especially, advanced algorithms and emerging AI hardware enabled by nonvolatile memories[2]. In this aspect, the emerging memory technologies, such as magnetic random-access memories[3], ferroelectric random-access memories[4], resistive random-access memories[5, 6] and phase-change random-access memories[7], have been implemented to accelerate AI computing, for instance, the matrix multiplication[8]. Thanks to their high energy-efficiency, fast speed, long endurance, and versatile functionalities, spin-tronic devices based on spin-orbit torques as one prominent example among emerging memories, have shown great potential in the aspect of hardware-accelerated true random number generation (TRNG)[9-18] besides of the matrix multiplication. For instance, the high quality true random number generators with stable and reconfigurable probability-tunability have been demonstrated using SOT -MTJs [19-21].
AMD Radeon RX 9070 and 9070 XT review: The new 1440p gaming champions
Some software bugs mar the experience but overall, AMD's 9070 graphics cards offer such a compelling mix of performance, value, and memory capacity that it's worth accepting those quibbles. Nvidia fumbled the ball with its 549 GeForce RTX 5070, and AMD's new Radeon RX 9070 and 9070 XT are primed to seize advantage. The RTX 5070, hitting store shelves today, is a good 1440p graphics card but a stagnant generational sidegrade at best. Enter the 549 Radeon RX 9070 and 599 Radeon RX 9070 XT, launching tomorrow. Both cards are faster than the RTX 5070, with the 9070 XT going toe-to-toe with the 750 RTX 5070 Ti in many games, and each includes an ample 16GB of VRAM.